The Zea mays ssp mays cv B73 Reference Genome
B73 Representative Reference Genome Assembly Status
B73 Representative Reference Genome Assembly Details
Change History
B73 Representative Reference Gene Models and Nomenclature
Gramene Versions
Genome Assembly and Gene Model Issues
B73 Stock Information
Chromosome names - Genbank accessions
Downloads
History of Maize Genome Assemblies and Annotations
The Nomenclature Standards
Publications
FAQs
Coming Soon
Historic information
The Maize B73 Representative Reference GenomeThe maize B73 reference genome has been revised four times since its initial release as a BAC-by-BAC assembly in 2009. As of 2016, the maize nomenclature committee has adopted naming standards to accommodate multiple Zea species, multiple accessions, and multiple versions. This recommendation is available here. The B73 reference assemblies have been known by these names:
B73 Representative Reference Genome Assembly StatusThe current representative reference genome for Maize is B73 Zm-B73-REFERENCE-NAM-5.0 (also known as RefGen_v5). The current B73 assembly version, Zm-B73-REFERENCE-NAM-5.0, released in January 2020, was sequenced and assembled along with a set of 25 inbreds known as the NAM founder lines by the NAM Consortium using PacBio long reads and mate-pair strategy. Scaffolds were validated by BioNano optical mapping, and ordered and oriented using linkage and pan-genome marker data. RNA-seq data from multiple tissues were used to annotate each genome using a pipeline that includes BRAKER, Mikado, and PASA. The first three assemblies, B73 RefGen_v1, B73 RefGen_v2, and B73 RefGen_v3 were all based on a BAC (bacterial artificial chromosome) sequencing strategy. B73 RefGen_v4 assembly used a new approach that relied on PacBio Single Molecule Real Time (SMRT) sequencing at Cold Spring Harbor to a depth of 60X coverage with scaffolds created via the assistance of whole genome restriction mapping (aka Optical Mapping). Error correction of PacBio sequences was facilitated by Illumina short read DNA sequencing performed at Washington University. Annotation was accomplished in the Ware laboratory at Cold Spring Harbor using the Maker pipeline (Campbell, 2014) and ~111,000 long read PacBio transcipts from six maize tissues. More complete details in the B73 RefGen_v4 assembly can be found at Gramene or by reading the paper. See the History of Maize Genome Assemblies and Annotations for more information. B73 Representative Reference Genome Assembly Details
The current version is
Zm-B73-REFERENCE-NAM-5.0,
also known as "B73 RefGen_v5".
Zm-B73-REFERENCE-NAM-5.0/Zm00001eb.1 Information In-depth metadata for Zm-B73-REFERENCE-NAM-5.0 is available here.See the paper for B73 RefGen_v1 here, and for Zm-B73-REFERENCE-GRAMENE-4.0 here. Counts for each chromosome.
Chromosome
Accession
Length
Protein Coding
Transposable Element
Chromosome 1
LR618874.1
308,452,471
5892
227,345
Chromosome 2
LR618875.1
243,675,191
4751
176,504
Chromosome 3
LR618876.1
238,017,767
4103
173,251
Chromosome 4
LR618877.1
250,330,460
4093
183,689
Chromosome 5
LR618878.1
226,353,449
4485
160,922
Chromosome 6
LR618879.1
181,357,234
3412
129,220
Chromosome 7
LR618880.1
185,808,916
3070
141,993
Chromosome 8
LR618881.1
182,411,202
3536
130,992
Chromosome 9
LR618882.1
163,004,744
2988
117,200
Chromosome 10
LR618883.1
152,435,371
2705
112,766
Unmapped
5892
23,216
Nuclear Total
~2,182,000
39,756
1,577,104
Annotations:
Zm00001eb.1
NCBI 103
Zm-B73-REFERENCE-NAM-5.0/Zm00001eb.1 Stats Gene Feature
Value
Average protein-coding transcript size
5376 bp
Longest transcript:
745,091 bp (Zm00001eb334630_T004)
Average transposable element size
1638 bp
Average Exon size
290 bp
Average Number of exons per gene
6 exons
Maximum exons per gene
80 exons (Zm00001eb126710_T002)
Average Coding region size
1816 bp
Previous reference genome assemblies (Zm-B73-REFERENCE-GRAMENE-4.0) Zm-B73-REFERENCE-GRAMENE-4.0/Zm00001d.2 Information In-depth metadata for Zm-B73-REFERENCE-GRAMENE-4.0 is available here.See the paper for B73 RefGen_v1 here, and for Zm-B73-REFERENCE-GRAMENE-4.0 here. Counts for each chromosome.
Chromosome
Accession
Protein Coding
miRNA
Transposable Element
Low Confidence
Chromosome 1
NC_024459.2
5905
14
2209
Chromosome 2
NC_024460.2
4737
22
2209
Chromosome 3
NC_024461.2
4737
16
1571
Chromosome 4
NC_024462.2
4115
20
1826
Chromosome 5
NC_024463.2
4480
24
1681
Chromosome 6
NC_024464.2
3290
11
1223
Chromosome 7
NC_024465.2
3108
10
1193
Chromosome 8
NC_024466.2
3561
13
1288
Chromosome 9
NC_024467.2
2973
7
1191
Chromosome 10
NC_024468.2
2684
17
1034
Unmapped
319
0
357
Nuclear Total
39,324
154
15,516
Annotations:
Zm00001d.2
Zm-B73-REFERENCE-GRAMENE-4.0/Zm00001d Stats Gene Feature
Value
Average protein-coding transcript size
7638 bp
Average low confidence transcript size
6981 bp
Average transposable element size
unavailable
Average Exon size
156 bp
Average Number of exons per gene
4 exons
Maximum exons per gene
81 exons (Zm00001d040166)
Average Intron size
578 bp
Average Coding region size
207 bp
Assembly process:
In-depth metadata for B73 RefGen_v3 is available
here.
Detailed information about the V3 assembly process is available at . B73 RefGen_v3 Information Counts for each chromosome.
Chromosome
Accession
Protein Coding
miRNA
Transposable Element
Low Confidence
Chromosome 1
NC_024459.1
6007
15
4296
6044
Chromosome 2
NC_024460.1
4742
23
3582
4997
Chromosome 3
NC_024461.1
4174
16
3093
4352
Chromosome 4
NC_024462.1
4182
21
3668
4688
Chromosome 5
NC_024463.1
4473
22
3103
4249
Chromosome 6
NC_024464.1
3278
10
2502
3430
Chromosome 7
NC_024465.1
3115
10
2424
3274
Chromosome 8
NC_024466.1
3505
13
2593
3508
Chromosome 9
NC_024467.1
2991
8
2404
3288
Chromosome 10
NC_024468.1
2688
18
2268
2734
Unmapped
146
0
51
59
Nuclear Total
39,475
156
29,996
40,680
Annotation:
B73 RefGen_v3 gene model set 5b+
B73 RefGen_v3 Stats Gene Feature
Value
Average protein-coding transcript size
4255 bp
Average low confidence transcript size
959 bp
Average transposable element size
1694 bp
Average Exon size
287 bp
Average Number of exons per gene
3.6 exons
Maximum exons per gene
35 exons (GRMZM2G068755_T01)
Average Intron size
630 bp
Average Coding region size
213 bp
B73 RefGen_v2 Information In-depth metadata for B73 RefGen_v2 is available here. Counts for each chromosome.
Chromosome
Working Gene set (WGS)
WGS Transcript
Filtered gene set (FGS) model
FGS Transcript
Chromosome 1
16,344
20,556
6,056
9,899
Chromosome 2
13,284
16,387
4,766
7,485
Chromosome 3
11,613
14,383
4,197
6,650
Chromosome 4
12,517
15,463
4,197
6,822
Chromosome 5
11,828
14,920
4,503
7,319
Chromosome 6
9,207
11,458
3,293
5,263
Chromosome 7
8,813
10,965
3,147
5,081
Chromosome 8
9,633
12,085
3,531
5,695
Chromosome 9
8,347
10,313
2,920
4,690
Chromosome 10
7,718
9,463
2,727
4,274
Unmapped
157
170
52
62
Nuclear Total
109,461
136,163
39,389
63,240
Mitochondria
171
175
124
127
Chloroplast
72
73
57
58
Total
109,704
136,411
39,570
63,425
Annotation:
B73 RefGen_v2: Release 5b.60
B73 RefGen_v2 Stats Gene Feature
Value
Average WGS transcript size
2646 bp
Average FGS transcript size
4237 bp
Average Exon size
287 bp
Average Number of exons per gene
3.6 exons
Maximum exons per gene
53 exons (GRMZM2G068755_T01)
Average Intron size
629 bp
Average Coding region size
210 bp
Average 5' UTR average length
280 bp
Average 3' UTR average length
336 bp
B73 RefGen_v1 Information In-depth metadata for B73 RefGen_v1 is available here. Change history
B73 RefGen_v1First complete assembly of the B73 genome.B73 RefGen_v2Improvements to order and orientation of within-BAC contigs using the minimum tiling path (MPT). Improvements to gene models.B73 RefGen_v3Captured missing gene space using WGS reads. 213 new gene models were introduced, 251 gene models were improved, and 10 gene models were merged to create new models:GRMZM2G000964, GRMZM2G103315 -> GRMZM2G000964 GRMZM2G045892, GRMZM2G452386 -> GRMZM2G045892 GRMZM2G119720, GRMZM2G518717 -> GRMZM2G119720 GRMZM2G142383, GRMZM2G020429 -> GRMZM2G142383 GRMZM2G319465, GRMZM2G439578 -> GRMZM2G319465 GRMZM2G338693, GRMZM2G117517 -> GRMZM2G338693 GRMZM5G861997, GRMZM5G864178 -> GRMZM5G861997 GRMZM5G872800, GRMZM2G143862 -> GRMZM5G872800 GRMZM5G891969, GRMZM5G823855 -> GRMZM5G891969 Zm-B73-REFERENCE-GRAMENE-4.0A de novo assembly using PacBio technologies. New annotation analysis with gene models linked to v3 gene models.Zm-B73-REFERENCE-NAM-5.0De novo Pac-Bio SEQUEL sequencing technology. Scaffolds validated by improved BioNano optical mapping. New annotation analysis.B73 Reference Gene Models and Nomenclature
With increasing numbers of full reference genomes with structural
annotation becoming available, it has become necessary to establish naming
standards that span genomes and versions. The recommendation is available
here.
Gene model sets (annotations) by reference assembly version:
The Zm00001eb.1 gene model set is the recommended gene model set
for Zm-B73-GRAMENE-REFERENCE-5.0 and is the representative gene model set
for maize.
For more information see the Nomenclature Standards Alternative annotationsAdditional annotations for the B73 genome assemblies have been generated by groups outside the genome sequencing project. The outside annotations listed below are shown as tracks on the assembly browsers.
Description of Gramene/Ensembl versions of B73 genome download filesVersions supported by MaizeGDB is in bold No changes were made to unmasked or masked assembly downloads unless noted.
B73 Reference Genome Assembly and Gene Model IssuesWe need your help! Please report any assembly or gene model structure problems. This includes misassembled regions, evidence for closing gaps, gene models that should be merged or split, evidence supporting low-confidence gene models, et cetera. All issues will be shared with the maize community and with the team charged with improving the B73 assembly and gene models.
Please contact MaizeGDB for information about open gene model issues
All open assembly issues B73 Stock Information
The seed source for both Zm-B73-REFERENCE-NAM-5.0 and
Zm-B73-REFERENCE-GRAMENE-4.0 descended from PI 550473, but was
maintained for several generations prior to being used as the
source seed. The seeds closest to those used for sequencing v4 were deposited
at the NCRPIS (accession number: PI 677128).
Chromosome - Genbank accessions reference Chromosome
B73 RefGen_v1
B73 RefGen_v2
B73 RefGen_v3
B73 RefGen_v4
B73 RefGen_v5
Publication
WGS (Whole Genome Shotgun) records at GenBank: Zm-B73-REFERENCE-GRAMENE-4.0 Zm-B73-REFERENCE-NAM-5.0 B73 Assembly and Gene Model DownloadsGramene files currently hosted at MaizeGDB correspond to Gramene version 36. See the summary of Gramene versions above.
WGS (Whole Genome Shotgun) records at GenBank: Zm-B73-REFERENCE-GRAMENE-4.0 Zm-B73-REFERENCE-NAM-5.0 V4 Functional annotation from Phytozome 10 (log in required) Cross reference for GRMZM and ZEAMMB73 IDs PublicationsHufford et al., 2021 De novo assembly, annotation, and comparative analysis of 26 diverse maize genomes. (Preprint)Jiao et al., 2017. Improved maize reference genome with single-molecule technologies. Jiao et al., 2017. Improved maize reference genome with single-molecule technologies. Law et al., 2015. Automated update, revision, and quality control of the maize genome annotations using MAKER-P improves the B73 RefGen_v3 gene models and identifies new genes. Wei et al., 2009. The physical and genetic framework of the maize B73 genome. Schnable et al., 2009. The B73 maize genome: complexity, diversity, and dynamics. Wei et al., 2007. Physical and Genetic Structure of the Maize Genome Reflects Its Complex Evolutionary History. Bi et al., 2006. Single Nucleotide Polymorphisms and Insertion–Deletions for Genetic Markers and Anchoring the Maize Fingerprint Contig Physical Map. Gardiner et al., 2004. Anchoring 9,371 maize expressed sequence tagged unigenes to the bacterial artificial chromosome contig map by two-dimensional overgo hybridization. Coe et al., 2002. Access to the Maize Genome: An Integrated Physical and Genetic Map. Yim et al., 2002. Characterization of Three Maize Bacterial Artificial Chromosome Libraries toward Anchoring of the Physical Map to the Genetic Map Using High-Density Bacterial Artificial Chromosome Filter Hybridization. FAQs
What is a Reference Genome?
What is a Reference Genome? A Reference Genome is a haploid representation of a genome as DNA sequence with a defined coordinate system, and accession and version identification. A Reference genome is usually assembled de novo, rather than relying on related genomes for assembly of small DNA fragments (which would be a reference guided assembly). A Reference Genome usually includes the structural annotations, or gene models, derived from the sequence assembly. A Reference Genome is almost always a work in progress that gets better with the additional new data over time. Data for improvement is collected continually, and at certain times, new Reference Genome versions come out that incorporate this data. B73 RefGen_v3 is such an updated version. What is a Representative Genome? A Representative Genome is a reference-quality genome which is considered to be representative for a species. B73 is the representative maize genome. What are the main changes between RefGen_v4 (Zm-B73-REFERENCE-GRAMENE-4.0) and Zm-B73-REFERENCE-NAM-5.0? Zm-B73-REFERENCE-NAM-5.0 is a de novo assembly using improved PacBio long-read technology and BioNano optical maps, using the same tissue sourc as RefGen_v4 (Zm-B73-REFERENCE-GRAMENE-4.0). What are the main changes between RefGen_v3 and RefGen_v4 (Zm-B73-REFERENCE-GRAMENE-4.0)? Zm-B73-REFERENCE-GRAMENE-4.0 was a complete de novo assembly using PacBio technology on DNA extracted from a descendant of the accession used for the v1 - v3 assemblies. What are the main changes between RefGen_v2 and RefGen_v3? Changes to the assembly include:
Why was a preliminary v5 annotation released? A preliminary annotation, Zm00001e.1, was release alongside the v5 genome assembly to put tools into the hands of researchers as soon as possible, but with warnings to not rely on any specific gene models until the formal annotation, Zm00001eb.1 was released. How can I map positions between the v4 and v5 assemblies? There is no converter yet available for translating between v4 and v5 positions, but chain files are available here, which can be used with LiftOver or CrossMap to convert sets of coordinates. Be aware that features on the unplaced scaffolds in the v4 assembly will not be correctly translated to the v5 assembly. How can I map positions between the v2 and v3 assemblies? Use the Ensembl assembly converter tool at Gramene. Where can I find legacy resources from MaizeSequence.Org? At the Gramene ftp archive. How can I identify the Filtered Gene Set (FGS) in RefGen_v3? In the 5b+ gene build, the former FGS gene models are indicated as protein-coding. Where can I download a GFF dump of the FGS for maize genes in v3 (5b+)? From the Gramene 5b+ ftp folder. Coming soon to MaizeGDBUpdated September 28th, 2021Tracks
Page updates
Tools
Features
|
Information about assembly B73 RefGen_v3 (also known as AGPv1)
Assembly identifier: Zm00001cGenome Sequencing Project Information
Stock and Biosample Information
Sequencing and Assembly Information
Genome coverage: 6x
Construction of pseudomolecules: Map-based order and orientation of a BAC tiling path, with some gaps filled with 454 contigs.
https://download.maizegdb.org/B73_RefGen_v3/
ftp://ftp.ncbi.nlm.nih.gov/genomes/genbank/plant/Zea_mays/all_assembly_versions/GCA_000005005.5_B73_RefGen_v3
A scaffold is set of a ordered and orientated contigs that are linked to one another by mate pairs of sequencing reads.